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TUDStract: This study has been carried out in order to determine 
cost-effective configurations of functional units for multiple-issue 
out-of-order superscalar processors. The trace-driven simulations were 
performed on the six integer and the fourteen floating-point programs from 
the SPEC 92 suite. We first evaluate the number of instructions allowed to 
be concurrently processed by the execution stages of the pipeline. We 
then apply some restrictions on the execution issue of different 
instruction classes in order to define these configurations. We conclude 
that five to nine functional units are necessary to exploit 
Instruction-Level Parallelism. An important point is that several data 
cache ports are required in a processor of degree 4 or more. Finally, we 
report on complementary results on the utilization rate of the functional 
units. (Author abstract) 15 Refs. 
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Abstract: As microprocessors become faster, the relative performance cost 
of memory accesses increases. Bigger and faster caches significantly 
reduce the absolute load-to-use time delay. However, increase in processor 
operational frequencies impairs the relative load-to-use latency, measured 
in processor cycles (e.g. from two cycles on the Pentium processor to three 
cycles of more in current designs) . Load-address prediction techniques were 
introduced to partially cut the load-to-use latency. This paper focuses on 
advanced address-prediction schemes to further shorten program execution 
time. Existing address prediction schemes are capable of predicting simple 
address patterns, consisting mainly of constant addresses or stride-based 
addresses. This paper explores the characteristics of the remaining loads 
and suggests new enhanced techniques to improve prediction effectiveness: 
Context-based prediction to tackle part of the remaining, 
dif f icult-to-predict , load instructions. New prediction algorithms to take 
advantage of global correlation among different static loads. New 
confidence mechanisms to increase the correct prediction rate and to 
eliminate costly mispredictions. Mechanisms to prevent long or random 
address sequences from polluting the predictor data structures while 
providing some hysteresis behavior to the predictions. Such an enhanced 
address predictor accurately predicts 67% of all loads, while keeping the 
misprediction rate close to 1% We further prove that the proposed predictor 
works reasonably well in a deep pipelined architecture where the 
predict-to-update delay may significantly impair both prediction rate and 
accuracy. (12 Refs) 
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Abstract: Due to technology's evolution, the number of transistors that 
can be integrated in a same chip has become, at the dawn of the 21st 
century, more than sufficient to implement simple superscalar cores. This 

excess, nowadays generally used for on-chip caches , can also be utilized 
to improve core's performances, but mainly to increase the core's 
superscalarness degree. Although it now seems that a high degree is not 
justified, it could become useful in the future with progress in 



compilation. Setting out from this observation, we describe a new 
superscalar architecture with a high out-of-order issue rate. This 
architecture implements, in particular, precise interrupt management and 
multiple branch prediction. Furthermore, the architecture's specification 
has taken into account the aspect of hardware implementation, and thus, 
temporal matching of pipeline's stages. We therefore assist to a finer 
partitioning of this pipeline, hence the additional superpipeline label. ( 
18 Refs) 
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ABSTRACT : 

A basic rule in computer architecture is that a processor cannot execute an 

application faster than it fetches its instructions. This paper presents a 
novel cost-effective mechanism called the two-block ahead branch predictor. 
Information from the current instruction block is not used for predicting 
the address of the next instruction block, but rather for predicting the 
block following the next instruction block. This approach overcomes the 
instruction fetch bottle-neck exhibited by wide-dispatch 'brainiac' 
processors by enabling them to efficiently predict addresses of two 
instruction blocks in a single cycle. Furthermore, pipelining the branch 
prediction process can also be done by means of the authors predictor for 
'speed demon* processors to achieve higher clock rate or to improve the 
prediction accuracy by means of bigger prediction structures. Moreover, and 
unlike the previously-proposed multiple predictor schemes, multiple-block 
ahead branch predictors can use any of the branch prediction schemes to 
perform the very accurate predictions required to achieve high-performance 
on superscalar processors. 
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